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ABSTRACT 

The term biological motion has been coined by G. Johansson (1973) to refer to the ambulatory 
patterns of terrestrial bipeds and quadripeds. In this paper a computational theory of the visual 
perception of biological motion is proposed. 

The specific problem addressed is how the three dimensional structure and motions of animal limbs 

may be computed from the two dimensional motions of their projected images. It is noted that the 

/—^ limbs of animals typically do not move arbitrarily during ambulation. Rather, for anatomical reasons, 

they typically move in single planes for extended periods of time. This simple anatomical constraint is 

exploited as the basis for utilizing a "planarity assumption" in the interpretation of biological motion. 

The analysis proposed is: (1) divide the image into groups of two or three elements each; (2) test 
each group for pairwise-rigid planar motion; (3) combine die results from (2). Fundamental to the 
analysis are two 'structure from planar motion 5 propositions. The first states that the structure and 
motion of two points rigidly linked and rotating in a plane is recoverable from three orthographic 
projections. The second states that the structure and motion of three points forming two hinged rods 
constrained to move in a plane is recoverable from two orthographic projections. The psychological 
relevance of the analysis and possible interactions with top down recognition processes are discussed. 
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1. Introduction 

The ambulatory patterns of terrestrial bipeds and quadripeds have long born a unique significance 
for man among the variety of motions extant in his visual world. One's chances of survival. in the 
neighborhood of a potential predator are presumably increased if one can distinguish an aimless 
meandering from a stealthful stalking or an outright run. More in line with our daily experience, we 
can quickly infer from the pendulum like motions of the limbs of a human whether he is walking, 
running, or performing some other motion. We can detect small deviations in gait patterns such as 
limps. Familiar individuals can often be recognized by the idiosyncracies of their gait. 

The term biological motion has been coined by G. Johansson (1973) to refer to this subset of visual 
motions. In this paper a computational theory 1 for the perception of biological motion is proposed. 

In developing this computational theory of biological motion we will also attempt to illustrate 
a research strategy that has been developed by investigators interested in providing computational 
descriptions of various aspects of human vision. The strategy may be schematized simply using six 
steps. 

First, human visual information processing is artificially parcelled into provisional independent 
modules for research tractability. 2 Next a "minimal information display", some highly impoverished 
visual display which clearly demonstrates a modular human visual ability, is devised. Third, once 
a minimal information display is found, the information available in the display is accurately and 
concisely described. Then the nature of the representations built by die visual system in consequence 
of being presented with die display is specified precisely. 3 Fifth, since the information available 
in the display generally is insufficient in principle to arrive at a unique representation of the type 
presumably built by the visual system, plausible domain specific constraints about the nature of the 
world arc sought which will allow the construction of such a unique representation. Finally an 
argument or constructive proof is devised to show that it is in principle possible to build a unique 
representation of the type desired given the information available in the display along with the a 
priori constraints about the world. 

Once the steps of the computational analysis are completed, specific algorithms are considered for 
detailed implementations of the computational theory. The implementations provide existence proofs 
diat the theory is internally consistent and also provide running models which can be tested for their 
psychological reality. 

*The term computational theoty is used in the sense proposed by Marr and Poggio (1977). Marr and Poggio observe 
that to thoroughly understand a complex information processing system involves obtaining descriptions of the system 
on three relatively independent levels. The top level, the level of the computational theory, describes what is being 
computed and for what purpose. The second level, that of the algorithm, specifics the nature of the particular algorithm 
used by the system in implementing the computational theory. The final level involves a description of the choice of 
hardware used in the system (eg, neurons versus digital components). 

2 Of course these modules are but interim constructs to be later richly interconnected in an ideally completed computational 
model of human vision. 

3 The nature of the ultimately desired representation is often inferred by noting what we see when shown the display. 
The desired representation may also be inferred in part by considerations of what in principle should be computed to 
reach, certain goals. Marr and Nishihara (1978), for example, suggest what they call the 3-D mode! representation based 
on considerations of what would be an optimal representation for the purpose of object recognition. 
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Fortunately the first two steps in building a computational theory for the perception of biological 
motion have already been completed by Johansson (1973, 1975). He first suggested that the percep- 
tion of biological motion may be an isolable submodule of visual perception, a module able to build 
rich descriptions of the structure and motions of animals with recourse only to the projected motions 
of a limited number of feature points. More specifically, he suggested that the perception of biological 
motion does not require any visual information about the form of the animal (ie, the outline due to its 
occluding contour), its texture, or color. 



2. A Minimal Information Display For Biological Motion 

Johansson devised a minimal information display to demonstrate that indeed the visual system 
can utilize motion information, with no further cues, to infer the correct structure and motion of an 
animal and often even to recognize which animal is being observed. The display is constructed as 
follows. Small light bulbs arc attached to a subject's body at each of its joints (eg, ankle, knee, hip, 
shoulder, elbow, wrist, etc.). The subject is then placed in a dark room and filmed while performing 
various activities. Single frames of the resulting film look to naive observer's as merely pictures of 
a few randomly placed dots. But when the film is shown at normal speeds naive observers almost 
immediately (within 100 to 1000 milliseconds) see the dots as a person walking, running etc. (See 
figure 1). In fact, die perception is so powerful that it is impossible to force oneself to interpret the 
dots in any other manner. 

The imports of this demonstration are two-fold. The first is psychological. Humans have the per- 
ceptual ability to utilize the two dimensional motions of feature points to build accurate descriptions 
of the underlying multi-limbed object. Thus it is of interest to perceptual psychologists to understand 
how humans perform this conveniently circumscribed task. The second import is on a more general 
computational level. Since humans perform this perceptual task so reliably and quickly it must 
in principle be possible to perform. What we have here, in essence, is an existence proof of that 
fact. Therefore we can be confident that if we carefully characterize the informational input and the 
perceptual representations which are built in consequence of that informational input, there exists a 
computational procedure that maps from the former to the latter. Just such a characterization will be 
attempted next. 
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Figure 1. A single frame of a typical biological motion movie showing a sideview of a person walking. In 
(a) the dots are shown and in (b) their proper connections illustrated. 

3. Characterizing the Information Available 

Before a computational solution to the problem of biological motion is possible, one must make 
explicit the actual information available to the visual system (often called the "proximal stimulus") 
and the form of the ultimately desired representation. The desired representation will also be called 
the target representation. The actual information available to the visual system may be called the 
source representation. In this section we describe the source representation and in the next section the 
form of the target representation. The problem will be to find a mapping from the former to the latter. 

There appear to be at least four possible characterizations of the source representation (see figure 
2). These four characterizations arise from decisions about the appropriate models for (a) the nature 
of the projection from the world onto the image plane and (b) die nature of the representation of the 
motion information. Although only one of the four characterizations will be used here, all four merit 
computational investigation. 

The available information will here be characterized as a series of temporally successive or- 
thographic snapshots. 4 In each snapshot what is explicitly represented is the two dimensional coor- 
dinates of the projections of the limb joints, such as the ankle, knee, and hip. Motion information is 

4 We assume that the correct correspondence of points in the successive snapshots has already been assigned. This 
problem is discussed in detail by Ullman (1979) and Marr (1981). 
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Figure 2. Possible characterizations of the input information to the biological motion module. The choice 
taken here is orthographic (parallel) projection with discrete motion. The other three characterizations are also 
viable candidates which should be considered 

obtained by observing how the coordinates change from frame to frame. The actual coordinate system 
(eg, Cartesian, polar coordinates, etc.) used to represent the two dimensional coordinates of the joints 
is not a concern at this point and will be left undetermined. 



4. Defining the Target Representation 

Two major considerations are involved when trying to specify a plausible target representation 
for the interpretation of biological motion. First, what do people perceive when presented with the 
minimal information display? Second, what information should be made explicit in the representation 
to facilitate attaining plausible goals of the observer? 

The argument from perception is simple. When shown a biological motion display one perceives 
the three dimensional structure and motion of the limbs. Presumably then one must represent the 
three dimensional structure and motion of the limbs. This suggests that three dimensional primitives 
are appropriate for die target representation. 

The computational argument is more involved. One plausible utilization of the target repre- 
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sentation, though certainly not the only, is in shape recognition. Marr and Nishihara (1978) examine 
the problem of designing a representation that is in some sense optimal for recognizing shapes. Based 
on representational design issues and on several criteria for judging the usefulness of a representation 
for shape recognition they suggest a three dimensional representation based on a shape's natural 
axes which they call a 3-D model. Marr and Vaina (1980) extend these arguments to the case of 
recognizing moving shapes. 

Based on the argument from perception and on the considerations raised by Marr and Nishihara 
we suggest that a plausible target representation is a three dimensional description of structure and 
motion akin to what Marr and Nishihara call a 3-D model. Specifically, what is to be computed is the 
length (in three dimensions) of each limb segment, the joint angle (in three dimensions) between each 
limb segment and both its successor and predecessor, and how these angles change over time. 

'The computational problem may now be precisely formulated. We would like to find a mapping 
from a finite number of two dimensional orthographic projections of the endpoints of the limbs of a 
moving animal to a three dimensional representation of the structure and motion of die animal which 
Marr and Nishihara have called a 3-D model. 
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5. The Planarity Assumption 

Unfortunately there is no unique mapping from a series of frames of a biological motion movie 
to a 3-D model. The set of candidate three dimensional representations which may consistently be 
paired with the two dimensional source data is infinite. What we have is a fundamental ambiguity of 
interpretation. 

To make the nature of this ambiguity clear we introduce the notion of a pairwise-rigid structure 
(see figure 3). A pairwise-rigid structure is a set of points moving in space so that each point remains 
at a constant distance from at least one other point, and no three points arc in a rigid configuration. 
Intuitively a pairwise-rigid structure is a set of rigid rods joined end to end in hinge joints with no 
three rods forming a triangle. Consequently arms and legs qualify as pairwise-rigid structures. 

It can be shown that an infinite number of different pairwise-rigid staictures can give rise to the 
same sequence of two dimensional projections regardless of the size of the sequence (Flinchbaugh, 
1980). The ambiguity derives from the fact that there are an infinite number of rigid interpretations 
consistent with the motions of two distinct points in an image sequence. Knowledge of the exact posi- 
tion and motion of one of the points does not resolve the ambiguity. Thus even if an interpretation is 
chosen for one pair of points in a pairwise-rigid structure, an infinity of alternatives still remains for 
every other pair of points in the structure. 

Z "*^. To overcome this ambiguity we need to incorporate plausible constraints about the nature of the 

world into our interpretation scheme. More specifically, what we would like is a plausible constraint 
on the motions of the limbs of animals because, as we have seen, unless the motion of a pairwise-rigid 
structure is constrained it cannot be given a unique interpretation. 

One candidate motion constraint is the rigidity constraint (Ullman, 1979). Ullman proves that 
"Given three distinct orthographic views of four non-coplanar points in a rigid configuration, the 
structure and motion compatible with the three views arc uniquely determined/ 1 He then proposes an 
interpretation scheme based on a rigidity assumption which states: 

"Any set of elements undergoing a 2-D transformation which has a unique interpretation as a rigid 
body moving in space, should be interpreted as such a body in motion'' 

The rigidity constraint is sufficient to give a unique interpretation if the object observed is moving 
rigidly. However the objects of interest here, namely animal limbs, violate die requirement of having 
four rigidly moving non-coplanar points. All rigidly connected points on a limb are not only coplanar, 
dicy arc colinear. If a unique interpretation for biological motion is to be obtained, a constraint other 
than rigidity is required. 

We propose to exploit an anatomical constraint on the motions of most bipeds and quadripeds as 
the basis of an interpretation scheme for biological motion. Casual observation reveals that in general 
^ the limbs of an ambulating animal do not move about arbitrarily. Rather, for anatomical reasons, each 

limb tends to move approximately in a single plane for extended periods of time. That is, joints tend 
to allow rotation more or less about a line. As will be discussed in the next section, this anatomical 
constraint is sufficient to provide a unique interpretation for biological motion. 
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Figure 3. This figure illustrates a pairwise-rigid structure constrained to move in one plane. This pairwise- 
rigid structure is composed of two rigid rods (though in general a pairwise-rigid structure can have anywhere 
from one to an infinity of rigid rods) with endpoints at A and B and a common endpoint at the joint O. 
The only motion allowed is a change in the angle <£, translation in the plane spanned by OA and OB, and 
rotation in that plane. (In general pairwise-rigid structures are not subject to these motion constraints). 

Motivated by the observation of this anatomical constraint, the principle we propose for the inter- 
pretation of biological motion is what we shall call the planarity assumption? 



Any set of elements undergoing a 2- D transformation which has a unique interpretation as a pairwise- 
rigid structure moving in one plane, should be interpreted as such a body in motion. 



6 Although the most obvious application of the planarity assumption is in the interpretation of biological motion, we 
do not intend to imply that utilization of the assumption is restricted to the interpretation of biological motion. Rather 
we suggest it is a general principle for . interpreting visual motion that, like the rigidity principle, is used by the visual 
system whenever appropriate. 
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6. Interpreting Visual Motion Utilizing the Planarity Assumption 

The planarity assumption is employed in interpreting visual motion by looking at groups of two or 
three points and checking if they have a unique interpretation as a pairwisc-rigid structure constrained 
to move in a plane. If not, no interpretation is assigned. If so, the planar interpretation is provisionally 
accepted as correct. 

As is the case with Ullman's rigidity assumption for the recovery of three dimensional structure 
and motion, the planarity assumption must be shown to be immune to 'false targets 5 and 'phantom 
structures'. A false target occurs when a collection of points that docs not constitute a pairwise-rigid 
structure in planar motion gives rise to a series of orthographic projections which are consistent with 
the planar interpretation. A phantom structure occurs when a collection of points that docs constitute 
a pairwise-rigid stmcture in planar motion gives rise to a scries of orthographic projections which are 
consistent with more than one planar interpretation. Ullman's proof (1977, appendix 1.) that false 
targets occur only with probability zero also holds for the planar case. The proof that there can be no 
phantom structures follows from the following 'structure from planar motion' propositions. 

The stmcture from planar motion propositions 

Proposition 1: Given three distinct orthographic projections of the two endpoints of a rigid rod 
which is constrained to rotate in a plane, the structure and motion compatible with die three views are 
/***\ uniquely determined. 7 

Proposition 2; Given two distinct orthographic projections of the three endpoints of two rigid rods 
linked in a hinge joint to form a pairwise-rigid stmcture which is constrained to move in one plane, 
the structure and motion compatible with the two views are uniquely determined. 

The proofs for these propositions are outlined in appendices one and two respectively. The proofs 
arc constructive and thus provide algorithms for the computation of the structure and motion. 

The interpretation scheme 

The interpretation scheme based on these propositions is as follows. (1) Divide the image into 
groups of two or three elements each. The appropriate elements for the interpretation of biological 
motion seem to be the joints of the limbs of an animal, such as the ankle, knee, and hip. (2) Test each 
group for pairwise-rigid planar motion. For groups of two elements proposition 1 may be applied. For 
groups of three elements proposition 2 may be applied. (3) Combine the results from (2). 

Some potential objections to the scheme 

Some potential objections to this scheme should be considered. First, it appears that the most this 
scheme can deliver is the three dimensional structure and motion of the limbs of an animal. The 
trunk typically violates both the rigidity assumption and the planarity assumption. This may or may 
not be a serious objection. Two avenues are worth exploring on this problem. First, perhaps further 
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7 Since orthographic projection is used the structure and motion are uniquely determined up to a reflection about the 
image plane. 
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natural constraints in the spirit of the rigidity and planarity assumptions can be found to aid in the 
bottom up interpretation of trunk structure. In general, bottom up avenues of interpretation should 
be exhausted before recourse to top down schemes is taken. With this consideration in mind a second 
interesting possibility exists. Perhaps the limb structure and motion obtained bottom up using the 
planarity assumption is sufficient to provide a unique index into a stored table of 3-D models of 
animals. The interpretation of biological motion would then involve an interaction of both bottom up 
and top down processes. The bottom up processes get the interpretation process off the ground and 
the top down processes complete the interpretation of those structures which resist bottom up attack. 

Another objection to this scheme might be raised. The planarity assumption may work quite nicely 
when the object observed is performing some simple repetitive activity such as running, walking, or 
jogging. But how about more complicated activities? Johansson, for example, has minimal informa- 
tion displays of a dancing couple which we seem able to interpret, though with a bit more difficulty. 
A good portion of die time the couple is badly violating the planarity assumption when, for example, 
they spin or turn. 

This objection brings up several interesting points. First it should be noted that the planar inter- 
pretation scheme does not provide spurious interpretations when the planarity assumption is violated. 
The scheme can determine when the assumption is valid and when it is not. 8 When it is invalid, no 
interpretation is made. 

To make the second point we divide the dancing sequence into three categories depending upon 
which assumptions the couple's movements obey. During part of the sequence, for example when 
the partners step toward or away from each other, their movements conform to die planarity assump- 
tion. During these movements die planarity scheme can uniquely determine the three dimensional 
structure and motions of the dancers' limbs. At other times the dancers spin with many of their limbs 
held in one position during the spin. Under these conditions the rigidity assumption holds and three 
dimensional structure may be computed. But there are definitely periods when the motion clearly 
violates both the rigidity and planarity assumptions. During these periods the bottom up processes 
proposed so far will simply not be able to give an interpretation. What could happen perceptually 
during these periods? There are two possibilities. First, the visual system could utilize die structural 
information obtained during periods obeying the planarity or rigidity assumptions to interpret the 
motion during the periods of violation. It can be shown that if the three dimensional structure of 
a pairwise-rigid object is known, dien its motion can be inferred uniquely even when the planarity 
constraint is violated. The second possibility is simply that no interpretation is made during these 
periods of violation. From observing these dancing displays it appears that die latter possibility is 
what often happens. At the moment a dancer starts a spin, we momentarily lose the structure and 
motion only to. regain it later during periods of planar motion. 

Unking the feature points 

One advantage accrues to die planarity scheme somewhat as a side effect. A persistent problem for 

8 This is a point that may have escaped some researchers who have objected to the use of elaborate assumptions to 
aid in the interpretation of the visual world. Generally, schemes based on elaborate assumptions are able to check in 
a bottom up manner whether or not their assumptions are valid A second point is worth mentioning. Trie world is 
structured. Why shouldn't the visual system exploit that structure in interpreting the visual world? 
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investigators of biological motion has been to get the correct two dimensional linking of the points. 
For example, how can we go about linking the ankle point to the knee and the knee to the hip without 
also introducing an incorrect link between the ankle and hip? Simple solutions like nearest neighbor 
connections simply do not work. Rashid (1979) sets out specifically to compute the correct two dimen- 
sional links based on graph-theoretic cluster analysis of the two dimensional positions and velocities 
of the points. Webb (1980) starts his analysis of structure from biological motion by assuming that the 
correct two dimensional links are already known. The planarity scheme does not make the computa- 
tion of the correct two dimensional linking of the points a specific goal. Instead the three dimensional 
structure and motion are computed and the two dimensional linkage then falls out incidentally. 

A simple example may help to see this. Suppose we have several views of just three feature points: 
the ankle, knee, and hip. We would like to determine if there is a unique interpretation of these points 
as a pairwise-rigid structure in planar motion using the second proposition that two views of three 
points is sufficient for our purpose. First we submit the three points to an implementation of proposi- 
tion 2 with the ankle togged as the provisional pivot point. The routine returns with no interpretation. 
Next we tag the hip as die provisional joint of a pairwise-rigid structure in planar motion. Again no 
interpretation is returned. Finally we ask if there is an interpretation with the knee as the pivot point. 
The routine returns the three dimensional distance between die knee and hip, between the knee and 
ankle, the motion of these limbs, and the the plane of the motion. Consequently we know that this is 
the correct interpretation, and we know the correct three dimensional structure and motion. But note 
/"-n that we also know, as a side effect, diere is no link between the ankle and hip feature points. 

A psychophysical prediction 

Some previous algorithms require that each limb be seen at least once in its full extension so that 
its projected length is die same as its length in three dimensions. The planarity scheme clearly predicts 
that it is not necessary to see any of the limbs in maximal extension to infer the correct staicture and 
motion. The critical psychophysical experiment on diis issue is trivial. One simply views a biological 
motion display where none of the limbs reaches maximal extension. When this is done die perception 
of die biological motion is not at all reduced. 



7. Summary 

The visual interpretation of biological motion has been investigated using a computational ap- 
proach. Anatomical constraints on how the limbs of animals typically move during ambulation were 
exploited as die basis for an interpretation scheme based on an assumption of planar motion. Two 
"structure from planar motion" propositions were proved, providing explicit computational methods 
for implementing die planarity scheme. 
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Appendix 1. The Structure From Planar Motion Proposition For Two Points 

Proposition: Given three distinct orthographic projections of the two endpoints of a rigid rod 
which is constrained to move in a plane, the structure and motion compatible with the three views are 
uniquely determined (up to a reflection about the image plane). 

Outline of Proof: Let O, A\, A<i, and A3 be the endpoints of the rigid rod in frames one through 
three respectively (see figure 4). Let aj be the vector from O to A{ in frame i. Let the coordinates 
of Bi be (xi, yu Zi). Under orthographic projection the x and y coordinates of each vector remain 
unaltered and the z coordinates are lost completely. Thus the problem consists of recovering the three 
unknown z coordinates. We first show that there are at. most four solutions (ie, two solutions plus their 
reflections) for the z coordinates given three views, and then show that there is a unique solution. 

Note that in figure 4 the reference point O does not translate over the three views. This does not 
imply a loss of generality. Two types of translation are possible. The first, translation in depth, is in 
principle unrecoverable under orthographic projection, The second, translation parallel to the image 
plane, yields projected translations identical to die translation of the object in the world. Since these 
translations are trivially recovered, they are ignored in this analysis. 



From the fact that the length (in three dimensions, not in the image) of a is invariant over the three 
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views we obtain the two equations: 9 

l|ai|| = ||a 2 ||- (1) 

l|ai|| = ||a 3 || (2) 

Three vectors lie in a plane if and only if their triple scalar product is zero. From the planarity 
constraint we obtain the equation: 10 

[aia 2 a 3 ] = (3) 

Equations (1), (2), and (3) may be expanded into polynomial equations in terms of their z coor- 
dinates giving: 

*2_*2 + ibl= o (4) 

*5_*2 + te = o (5) 

k^zi -f k^i + k 5 z 3 = ' (6) 

The it's in these equations are expressions entirely in the x and y coordinates of the position 
vectors. 11 Since these quantities are available directly from the orthographic projections they are 
lumped together into constants. The goal here is to solve these three equations for the three z 
coordinates. 

The solution space for the three z coordinates in three views can be visualized as the mutual inter- 
section points of two hyperboloid sheets and one plane passing through the origin. This is illustrated 
in figureS, 

The simple fact that we have three equations and three unknowns here does not mean that this 
system has a finite number of solutions. To ascertain if there are a finite number of solutions we 
apply the inverse function theorem. This theorem allows us to conclude that wherever the Jacobian 
of these equations is nonsingular the mapping defined by the equations is locally one to one and onto 

9 Thc notation ||ai|| i s vector shorth and for the length of the vector Si. In terms of the components of a A this length 
may be expressed \fx\+y\~\-z{. 

10 The triple scalar product of three vectors 0i, a 2 , a 3 is indicated by the shorthand [aia 2 a 3 ]. Taking the triple scalar 
product involves first taking the vector cross product of a 2 and a 3 and then taking the dot product of the resulting 
vector with ai. Intuitively the triple scalar product gives the volume of the parallelepiped formed by the vectors 0i, 
■a,2, and B3. 

ll The actual expressions for the fc's are k x = x\ + y\ — x\ - y\ k 2 «= x] + y\ — x\ — y% k 3 — x 2 y 3 -- x 3 y 2i 
k 4 = X3V1 — #11/3, k 5 = xiui — x 2 yv 
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(ie, a local diffeomorphism). Consequently any roots at points where the Jacobian is nonsingular are 
isolated and not part of a continuum of solutions. 

The determinant of the Jacobian of (1) - (3) is: 
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This Jacobian has rank three. 12 If (4) - (6) involved transcendental functions the most we could 
conclude from this Jacobian test would be that the set of solutions was of measure zero. However (4) 
- (6) are polynomials. Consequently we can assert that the system of equations has but a finite set of 
solutions in general. By Bczout's theorem 13 we know that the sum of the multiplicities of die solutions 
does not exceed the product of the degrees of the equations, which in this case is four. 

We have shown that there are at most four real solutions given three views of the two points. These 
four solutions come in two pairs, with die two members of a given pair being the reflections about the 
image plane of each other. 

We now prove the solution is unique up to a reflection. 14 Solve (6) for z u substitute into (4) and (5) 
and simplify. 

{k\ - k\)z\ + k\z\ + 2k 4 k 5 z 2 z 3 + kik\ = (7) 

k\z\ + {k\ - kl)4 + 2k 4 k 5 z 2 z 3 + k 2 k\ = (8) 

Multiply (7) by k 2 and (8) by k\. Subtract (8) from (7). Divide the result by z\ and let x = z 2 /z 3 . 

Mk\ - k\) - k { k 2 4 ]x 2 + 2k 4 k 5 (k2 - h)x + [k 2 k\ - h{k\ - k 2 3 )} = (9) 

Solve (9) for x. 



x = -b±V#-4ac (1Q) 

2a 

12 The actual expressions for the /c's in terms of rr's and y's must be used when determining the rank of the Jacobian. 
Otherwise hidden dependencies among the variables may escape notice. One can find all the degenerate cases (ie, cases 
when the Jacobian drops rank and no unique solution is possible) by factoring the determinant of the Jacobian and 
setting the factors equal to zero. 

13 For a nontechnical discussion of the inverse function theorem and Bezout's theorem see Richards, Rubin, and Hoffman 
(1981). 

^Ilorn (1981, personal communication) first proved uniqueness of the solution. He noted that the two points in planar 
motion trace out a circle in space. This circle maps into an ellipse with known center under orthographic projection. 
Three points on the ellipse determine its three parameters — the major and minor axes, and the angle of the major 
axis. lie made a similar construction for the case of two views of three points. 
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Where a = h{k\ - kj) - k x k% b = 2k 4 k 5 {k 2 - fci), and c = k 2 k 2 - k { {k 2 5 - k 2 3 ). 



Before continuing we establish one elaim. 

Claim: Provided die plane of rotation of the rod is not parallel to the image plane and that none of the 

three projected images of the rod are collinear, at most one of the solutions for x is valid. 

Proof: Let the length of the rod be p. Let the projected len gths of t he rod in views 2 and 3 be, 
respectively, r 2 and r 3 . Then z 2 = ±vV — 4 and *3 = ±\/ r P 2 — r l Consequently 



*i xsff^—rl fin 

x= 7 = ±—===== UU 

Thus if x has two solutions, then these two solutions must have the same absolute value and op- 
posite sign if both are to be valid. From (10) we conclude that x will have two valid solutions only 
when 



—b -f- \fb 2 — Aac 



2a 



-b — \/W— 4ac 
™~2a " 



(12) 



which is true only when 6 = 0. From the equation for b in (10) we see this implies the following 
degenerate conditions: k 4 =. 0, k 5 = 0, k 2 = h. k 4 can be interpreted as the dot product of the 
projected image of the rod in view three with a vector orthogonal to the projected image of the rod 
in view one. k 5 can be interpreted as the dot product of the projected image of the rod in view one 
with a vector orthogonal to the projected image of the rod in view two. Thus k 4 or k$ is zero only if 
the appropriate projected images of die rod arc collinear. fo = k 2 implies that r 2 = r 3 . This can 
happen if the plane of rotation of the rod is parallel to the image plane or if the projected images in 
view two and three are collinear. Thus, except for these degenerate conditions, x must have a unique 
valid solution. 

QED. 



Substitute z 3 x for z 2 in (7). This can be done since x = z 2 /z 3 . Note that x is now one of two known 
values. 

(fcj — *l)a: 2 ^i + *|^| + 2AL|A^a»i fci*! == 6 "(13) 

The solution for £3 is 



— j_ / ~-k x k 3 /]4) 

* ~~ ± If [k\ - fcj^ 2 + kl + 2k 4 k 5 x 



j***\ 
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The solutions for z 2 and z { follow immediately: z 2 = xz 3 , z\ = \J~z\ — ¥ 2 . By the claim we know 
that only one of the two values of a? is valid, except in degenerate cases. Thus the solutions for z u z 2 , z 3 
are unique up to a reflection. 

In practice we can find solutions for the z's using both values of x and reject the pair 
of solutions which is either imaginary or which violates the conditions established in the 
claim. 
QED. 



^f*"»*% 



DDH & BEF 



18 



BIOLOGICAL MOTION 




Figure 5. The solution space for the coordinates z u 22, and z&. The solution can be seen here to be 
the mutual intersection of two hypcrboloid sheets rotated ninety degrees with respect to each other 
and a plane passing through die origin. The asymptotic lines of the hypcrboloid sheets are always at 
forty five degrees with respect to the z^ axis. (The limbs of the hypcrboloids on the other side of the 
Z2Z3 plane are not shown.) 
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z*^ 



I 


0^ — 

image 





Figure 6. Geometry underlying the proof of proposition 2. 

Appendix 2. The Structure From Planar Motion Proposition For Three Points 

Proposition: Given two distinct orthographic projections of the three endpoints of two rigid rods 
linked to form a pairwise-rigid staicture which is constrained to move in a plane, the structure and 
motion compatible with the two views are uniquely determined (up to a reflection about the image 
plane). 

Outline of Proof: Let O, A{, B{ be the endpoints of the two rigid rods (which form a joint at O) in 
frame i, where i = 1, 2. (See figure 6). Let a t be the vector from O to Ai and b z be the vector from O 
to B{. Let the coordinates of a i be (x ai , y a i, z ai ). Let the coordinates of b; be (x bii y bi , z bi ). Under or- 
thographic projection the x and y coordinates of each vector remain unaltered and the z coordinates 
are lost completely. Thus the problem consists of recovering the four unknown coordinates z a { and 
z bi . We first show that there are but a finite number of solutions for the z coordinates given only two 
views, and then show that the solution is actually unique up to a reflection. 

From the fact that the lengths (in three dimensions, not in the image) of a and b remain invariant 
over the two views we obtain the two equations: 15 



llaill = INH 

lf5 See the footnotes in appendix 1 for an explanation of the vector notation used in these equations. 



(i) 
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l|bi|| = ||b 2 || 



(2) 



Three vectors lie in a plane if and only if their triple scalar product is zero. From the planarity 
constraint we obtain the two equations: 



[aibia 2 ] =0. 



(3) 



[aibib 2 ] = 



(4) 



Equations (1) - (4) may be expanded into polynomial equations in terms of their four z coordinates 
giving: 



£-£ + ** = 



(5) 



4-4 + *2 = o 



(6) 



kjZal + Mn + k 5Za2 — 



(7) 



kf#al + Mbi + K*b2 — 



(8) 



The fc's in these equations are expressions entirely in the x and y coordinates of the position 
vectors. Since these quantities are available directly from the orthographic projections they are 
lumped together into constants. The goal here is to solve these four equations for the four 2 coor- 
dinates. 

The simple fact that there are four equations and four unknowns does not imply that this system 
has a finite number of solutions. To ascertain if there are a finite number of solutions we apply 
the inverse function theorem. This theorem lets us conclude that wherever the Jacobian of these 
equations is nonsingular the mapping defined by the equations is locally one to one and onto (ie, 
a local diffeomorphism). This means that any roots at points where die Jacobian is nonsingular are 
isolated and not part of a continuum of solutions. 

The determinant of the Jacobian of these four equations is: 



A# al 


20 a 2 














2*6, 


— 22fc 2 


&3 


h 


k 4 





/Cq 





ky 


Afe 



This Jacobian has rank four. If (5) - (8) involved transcendental functions the most we could con- 
clude from this Jacobian test would be that the set of solutions was at most of measure zero. However 
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(5) - (8) are polynomials. Consequently we can assert that the system of equations has but a finite 
set of solutions in general. By Bezout's theorem 16 we know that the sum of the multiplicities of the 
solutions does not exceed the product of the degrees of the equations, which in this case is four. 

We have shown that there are at most four real solutions given two views of the three points. These 
four solutions come in two pairs, with die two members of a given pair being the reflections about the 
image plane of each other. The proof that the solution is unique is almost identical to that given in 
appendix one and will not be reiterated here. 



16 For a nontechnical discussion of the inverse function theorem and Bezout's theorem see Richards, Rubin, and Hoffman 
(1981). 
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